2024-04-11
Intro
B.S. in Biochemistry
M.S. in Biomedical Engineering
Summer Internship
M.S. in Electrical and Computer Engineering
M.S. in Computer Science
Ph.D in Machine Learning
Intro
Intro
Being excluded from activities
Receiving unsolicited advice
Others having unsympathetic or insensitve behaviors
Others failing to provide help
Wilson, Robert S et al. “Negative social interactions and risk of mild cognitive impairment in old age.” Neuropsychology vol. 29,4 (2015): 561-70. doi:10.1037/neu0000154
Intro
High Number of Interactions
Low Number of Interactions
E.g. Men living alone ->
2x cognitive decline in 10 years
How can we monitor people?
Personally Check in
Hire a Care Giver
Technology
Intro
Technology
https://www.peoplemanagement.co.uk/article/1747153/one-in-seven-workers-say-employer-monitoring-has-increased-during-covid
Intro
Audio is capable of capturing
Different modalities are capable of capturing different information
Intro
Intro
1. Can identify new incoming speakers and re-identify them
2. Can operate in real-time in an online algorithm
Intro
Intro
Intro
audio
Is this a new
speaker?
yes
no
Identify speaker
Enroll / Register
new speaker
Speaker =
Speaker =
Few-Shot Spkr ID
Intro
Few-Shot Spkr ID
Intro
Prob. Graph. Models
Support Vector Machines
Neural Networks
Decision Trees
Few-Shot Spkr ID
Intro
Prob. Graph. Models
Support Vector Machines
Neural Networks
Decision Trees
Few-Shot Spkr ID
Intro
Few-Shot Spkr ID
Intro
But this requires a lot of data!
Few-Shot Spkr ID
Intro
Few-Shot Spkr ID
Intro
Learn how to do a task well
(Meta-Learning)
Learn how to learn tasks well
Few-Shot Spkr ID
Intro
Learn how to do a task well
(Meta-Learning)
Learn how to learn tasks well
Few-Shot Spkr ID
Intro
Learn how to do a task well
(Meta-Learning)
Learn how to learn tasks well
Traditional Speaker Identification
Few-Shot Spkr ID
Intro
Learn how to do a task well
(Meta-Learning)
Learn how to learn tasks well
Traditional Speaker Identification
Few-Shot Spkr ID
Intro
Learn how to learn tasks well
Few-Shot Spkr ID
Intro
Input
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
Layer 6
Layer 7
Layer 8
x-vector
1
2
3
t-2
t-1
t
t+1
t+2
T
t
t
Time-Delay Neural Network
DNN
Stats Pooling
t-2
t-1
t
t+1
t+2
t-2
t-1
t
t+1
t+2
t-3
t+3
1
2
3
T
Our work:
0.5 secs audio
Few-Shot Spkr ID
Intro
Support Set
Query Set
Used to create prototypes
(i.e. centroids)
Used for training
Randomly choose classes:
Episode 1
Few-Shot Spkr ID
Intro
Support Set
Query Set
Used to create prototypes
(i.e. centroids)
Used for training
Randomly choose classes:
Episode 1
Few-Shot Spkr ID
Intro
Support Set
Query Set
Used to create prototypes
(i.e. centroids)
Used for training
Randomly choose classes:
Episode 1
Few-Shot Spkr ID
Intro
Support Set
Query Set
Used to create prototypes
(i.e. centroids)
Used for training
Randomly choose classes:
Episode 1
Few-Shot Spkr ID
Intro
Support Set
Query Set
Used to create prototypes
(i.e. centroids)
Used for training
Randomly choose classes:
Episode 1
Few-Shot Spkr ID
Intro
Randomly choose classes:
Support Set
Query Set
Used to create prototypes
(i.e. centroids)
Used for training
Episode 2
Few-Shot Spkr ID
Intro
Support Set
Query Set
Used to create prototypes
(i.e. centroids)
Used for training
Randomly choose classes:
Episode 2
Few-Shot Spkr ID
Intro
Support Set
Query Set
Used to create prototypes
(i.e. centroids)
Used for training
Randomly choose classes:
Episode 2
Few-Shot Spkr ID
Intro
Support Set
Query Set
Used to create prototypes
(i.e. centroids)
Used for training
Randomly choose classes:
Episode 2
Few-Shot Spkr ID
Intro
Input
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
Layer 6
Layer 7
Layer 8
x-vector
t
t
Time-Delay Neural Network
DNN
Stats Pooling
t-2
t-1
t
t+1
t+2
t-2
t-1
t
t+1
t+2
t-3
t+3
1
2
3
T
Few-Shot Spkr ID
Intro
1
2
3
t-2
t-1
t
t+1
t+2
T
Layer 7
Layer 8
DNN
Input
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
Layer 6
x-vector
1
2
3
t-2
t-1
t
t+1
t+2
T
t
t
Time-Delay Neural Network
Stats Pooling
t-2
t-1
t
t+1
t+2
t-2
t-1
t
t+1
t+2
t-3
t+3
1
2
3
T
Few-Shot Spkr ID
Intro
Input
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
Layer 6
x-vector
1
2
3
t-2
t-1
t
t+1
t+2
T
t
t
Time-Delay Neural Network
Stats Pooling
t-2
t-1
t
t+1
t+2
t-2
t-1
t
t+1
t+2
t-3
t+3
1
2
3
T
Few-Shot Spkr ID
Intro
Input
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
Layer 6
x-vector
1
2
3
t-2
t-1
t
t+1
t+2
T
t
t
Time-Delay Neural Network
Stats Pooling
t-2
t-1
t
t+1
t+2
t-2
t-1
t
t+1
t+2
t-3
t+3
1
2
3
T
Euclidean Distance
Assumption:
The latent subspace creates features which have Gaussian-like characteristics
Show formula
Few-Shot Spkr ID
Intro
Number of Classes/Speakers
x-vector dimension
512-dim
128-dim
16-dim
Number of Samples in Support/Query Sets
0.5s
Few-Shot Spkr ID
Intro
Few-Shot Spkr ID
Intro
Few-Shot Spkr ID
Intro
X-Vector
System
x-vectors
prototypical
loss
Few-Shot Spkr ID
Intro
X-Vector
System
x-vectors
Few-Shot Spkr ID
Intro
Few-Shot Spkr ID
Intro
Few-Shot Spkr ID
Intro
X-Vector System
Few-Shot Spkr ID
Intro
X-Vector System
Few-Shot Spkr ID
Intro
X-Vector System
Few-Shot Spkr ID
Intro
X-Vector System
Few-Shot Spkr ID
Intro
X-Vector System
Few-Shot Spkr ID
Intro
X-Vector System
Few-Shot Spkr ID
Intro
Few-Shot Spkr ID
Intro
Few-Shot Spkr ID
Intro
Few-Shot Spkr ID
Intro
E.g. Reduction in computational footprint by 18%
Few-Shot Spkr ID
Intro
Reduction in computational footprint by 18%
Learn quickly within 2.5s of audio
Few-Shot Spkr ID
Intro
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with > 41 mins
80%
10%
10%
X-Vector System
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with > 41 mins
80%
10%
Stats for Gaussians
10%
seen
seen
seen
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with > 41 mins
80%
10%
Stats for Gaussians
seen
seen
seen
10%
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with
< 41 mins
speakers with > 41 mins
80%
10%
Stats for Gaussians
seen
seen
seen
10%
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with
< 41 mins
speakers with > 41 mins
80%
10%
Stats for Gaussians
seen
seen
seen
10%
Unseen
Seen
This greatly reduces computational resources
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with > 41 mins
80%
10%
Stats for Gaussians
seen
seen
seen
10%
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with > 41 mins
80%
10%
Stats for Gaussians
seen
seen
seen
10%
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with > 41 mins
80%
10%
seen
seen
seen
10%
Stats for Gaussians
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with > 41 mins
80%
10%
seen
seen
seen
10%
Stats for Gaussians
?
?
?
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with > 41 mins
80%
10%
seen
seen
seen
10%
Stats for Gaussians
?
?
?
?
Detecting New Classes
Few-Shot Spkr ID
Intro
Compute F1 scores
?
Detecting New Classes
Few-Shot Spkr ID
Intro
speakers with > 41 mins
80%
10%
seen
seen
seen
10%
Stats for Gaussians
?
?
?
Detecting New Classes
Few-Shot Spkr ID
Intro
Detecting New Classes
Few-Shot Spkr ID
Intro
Created method to detect new classes based on few-shot learning clustering
Detection works under 2.5s of audio
Detecting New Classes
Few-Shot Spkr ID
Intro
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
2D Data
3D Data
What about 4 dimensions? 6 dimensions? 32 dimensions?
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
What about 4 dimensions? 6 dimensions? 32 dimensions?
Our desired x-vectors have 32 dimensions!
We can use t-SNE to check for qualitatively indications that the clusters have been clustered
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
edixl
fkvvo
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
Is this a new
speaker?
yes
no
Identify speaker
Enroll / Register
new speaker
Speaker =
Speaker =
This setup has many caveats!
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
Is this a new
speaker?
yes
no
Identify speaker
Enroll / Register
new speaker
Speaker =
Speaker =
Prob 1: The system will not know the actual labels as it creates predicted labels
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Solution:
Matching with Hungarian Algorithm
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
$10
$40
$50
$50
$80
$80
$50
$70
$60
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
$10
$40
$50
$50
$70
$60
$50
$80
$80
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
$10
$40
$50
$50
$70
$60
$50
$80
$80
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
There will be left overs classes when using a Hungarian Alg
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Greedy algorithms will use up every predicted class found!
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
I
L
L
L
L
K
K
J
J
A
C
C
B
B
B
B
B
D
D
E
E
E
F
G
G
H
This is VERY segmented!
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
I
L
L
L
L
K
K
J
J
A
C
C
B
B
B
B
B
D
D
E
E
E
F
G
G
H
Hungarian Algorithm
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
I
L
L
L
L
K
K
J
J
A
C
C
B
B
B
B
B
D
D
E
E
E
F
G
G
H
Greedy Algorithm
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Time (s)
Speaker
true label
predicted label
true label
predicted label
true label
predicted label
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Class
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Hungarian
Greedy
Baseline
Results for entire test set
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Hungarian
Baseline
Conclusion:
Results for entire test set
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?
speakers with > 41 mins
80%
10%
10%
Stats for Gaussians
seen
seen
unseen
unseen
seen
29 Speakers
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Experiment 2: Using 41min Covariance as Model Cov
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?
speakers with > 41 mins
80%
10%
10%
Stats for Gaussians
seen
seen
unseen
unseen
seen
29 Speakers
Experiment 2: Using 41min Covariance as Model Cov
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?
speakers with > 41 mins
29 Speakers
80%
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?
speakers with > 41 mins
29 Speakers
80%
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?
speakers with > 41 mins
29 Speakers
80%
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?
speakers with > 41 mins
29 Speakers
80%
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?
speakers with > 41 mins
29 Speakers
80%
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Experiment 2: Using 41min Covariance as Model Cov
Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?
speakers with > 41 mins
29 Speakers
80%
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Experiment 2: Using 41min Covariance as Model Cov
Baseline
Using 41min Covariance
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Hungarian Matching
Using 41min Covariance
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Hungarian
Greedy
Using 41min Covariance
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Covariance Adaptation
Experiment 3: 5s Initial Covariance Adaptation
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Covariance Adaptation
xvec queue
false
true
true
false
Collect the
x-vectors
...
Collection
Experiment 3: 5s Initial Covariance Adaptation
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Covariance Adaptation
xvec queue
true
false
Collect the
x-vectors
true
false
...
Collection
Trained Covariance
Train covariance
matrix on Collection
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Covariance Adaptation
xvec queue
true
false
Collect the
x-vectors
true
false
...
Collection
Train covariance
matrix on Collection
Trained Covariance
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Covariance Adaptation
xvec queue
true
false
Collect the
x-vectors
true
false
...
Collection
Train covariance
matrix on Collection
Trained Covariance
Experiment 3: 5s Initial Covariance Adaptation
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Covariance Adaptation
Experiment 3: 5s Initial Covariance Adaptation
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Covariance Adaptation
Experiment 3: 5s Initial Covariance Adaptation
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Baseline
5s Initial Covariance Adaptation
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Hungarian Matching
5s Initial Covariance Adaptation
Results for entire test set
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Hungarian
Greedy
5s Initial Covariance Adaptation
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Sample Mean at time T
Sample Covariance at time T
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Experiment 4: Algorithmic Stats
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Class
Experiment 4: Algorithmic Stats
Update
Classify as new cluster
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Baseline
Algorithmic Statistics
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Hungarian Matching
Algorithmic Statistics
Results for entire test set
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Hungarian
Greedy
Algorithmic Statistics
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Class
Update
Experiment 5: 5s Cov. Adapt + Algorithmic Stats
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Class
Update
Covariance Adaptation
Experiment 5: 5s Cov. Adapt + Algorithmic Stats
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Class
Update
xvec queue
true
false
Collect the
x-vectors
true
false
...
Collection
Train covariance
matrix on Collection
Trained Covariance
Experiment 5: 5s Cov. Adapt + Algorithmic Stats
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Class
Update
Covariance Adaptation
Experiment 5: 5s Cov. Adapt + Algorithmic Stats
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Class
Update
Covariance Adaptation
Experiment 5: 5s Cov. Adapt + Algorithmic Stats
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Baseline
5s Cov. Adapt + Algorithmic Stats
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Hungarian Matching
5s Cov. Adapt + Algorithmic Stats
Results for entire test set
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Hungarian
Greedy
5s Cov. Adapt + Algorithmic Stats
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Class
Update
Covariance Adaptation
Experiment 6: 5s Cov. Adapt + Algorithmic Mean
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
x-vectors
joint Maha. dist
to closest cluster
true
false
xvec queue
xvec queue
Create new
class
Classify as new cluster
Classify as closest cluster
Compute joint Maha. dists to closest cluster
Mahalanobis Classifier
Class
Class
Covariance Adaptation
Update
Experiment 6: 5s Cov. Adapt + Algorithmic Mean
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Baseline
5s Cov. Adapt + Algorithmic Mean
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Hungarian Matching
5s Cov. Adapt + Algorithmic Mean
Results for entire test set
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Hungarian
Greedy
5s Cov. Adapt + Algorithmic Mean
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Results for entire test set
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
letter-value plots
Results for entire test set
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Results for entire test set
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Identify and continually adapt vocal embeddings of voices it's never heard before
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Identify and continually adapt vocal embeddings of voices it's never heard before
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
Real-Time Platform
Detecting New Classes
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Intel i7-4770
4 cores / 8 threads
@ 3.40 GHz
June 2013
32 GB
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Recorder
Recorder
Recorder
Recorder
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Recorder
Recorder
Recorder
Recorder
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Recorder
Recorder
Recorder
Recorder
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker
Recorder
Recorder
Recorder
Recorder
Register New Class/Speaker
Speaker = New Speaker
Speaker = Closest Cluster
true
false
Update
speaker distr.
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker
Recorder
Recorder
Recorder
Recorder
Register New Class/Speaker
Speaker = New Speaker
Speaker = Closest Cluster
true
false
Update
speaker distr.
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker
Recorder
Recorder
Recorder
Recorder
Register New Class/Speaker
Speaker = New Speaker
Speaker = Closest Cluster
true
false
Update
speaker distr.
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker
Recorder
Recorder
Recorder
Recorder
Register New Class/Speaker
Speaker = New Speaker
Speaker = Closest Cluster
true
false
Update
speaker distr.
Front-End Dashboard
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker
Recorder
Recorder
Recorder
Recorder
Register New Class/Speaker
Speaker = New Speaker
Speaker = Closest Cluster
true
false
Update
speaker distr.
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker
Recorder
Recorder
Recorder
Recorder
Register New Class/Speaker
Speaker = New Speaker
Speaker = Closest Cluster
true
false
Update
speaker distr.
Front-End Dashboard
Real-Time Platform
Speaker Identification
Sensor Localization
Intro
System Infrastructure
x-vectors
Compute joint Maha. dists to closest cluster
joint Maha. dist
to closest cluster
true
false
Create new
class
Classify as closest cluster
Classify as new cluster
Mahalanobis Classifier
Class
Class
Update
Covariance Adaptation
Audio
x-vector system
Compute joint Maha. dists to closest cluster
joint Maha. dist
to closest cluster
true
false
Create new
class
Classify as closest cluster
Classify as new cluster
Mahalanobis Classifier
Class
Class
Update
Covariance Adaptation
Inputs
Compute joint Maha. dists to closest cluster
joint Maha. dist
to closest cluster
true
false
Create new
class
Classify as closest cluster
Classify as new cluster
Mahalanobis Classifier
Class
Class
Update
Covariance Adaptation
Inputs
Media Entertainment
Journalism
Object Recognition
Bioinformatics
Meetings
Data Mining
Zero-Shot & Adaptive ID
Few-Shot Spkr ID
Intro
Detecting New Classes
?
?
?
seen
seen
seen
Speaker
Register New Class/Speaker
Speaker = New Speaker
Speaker = Closest Cluster
Update
speaker distr.
false
true
Front-End Dashboard
joint Maha. dists to closest cluster
k = closest cluster
k = new cluster
Mahalanobis Classifier
Cov
Adapt
new class
F
T
Intro
Line of Best Fit
You have all the data from the beginning
You receive the data one piece at a time
Intro
Line of Best Fit
You have all the data from the beginning
You can have all the data but load one piece of data at a time
Real-Time data
Intro
Sensor Localization
Intro
There are a few steps that needed to be accomplished:
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
& Good Initial Conds
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Assumption:
Large hallway cross-sections lead to larger rooms than small hallway cross-sections
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
Sensor Localization
Intro
System Infrastructure
Sensor Localization
Intro
System Infrastructure
Sensor Localization
Intro
System Infrastructure
Sensor Localization
Intro
System Infrastructure
Sensor Localization
Intro
System Infrastructure
Sensor Localization
Intro
System Infrastructure
{
"PORT": 5050,
"HEADER": 64,
"FORMAT": "utf-8",
"DISCONNECT_MSG": "DISCONNECT",
"logging_format": "%(asctime)s - %(message)s",
"logging_level": "info"
}
1. Server Descriptor
Sensor Localization
Intro
System Infrastructure
{
"PORT": 5050,
"HEADER": 64,
"FORMAT": "utf-8",
"DISCONNECT_MSG": "DISCONNECT",
"logging_format": "%(asctime)s - %(message)s",
"logging_level": "info"
}
1. Server Descriptor
from audiosockets import MailmanSocket
mailman = MailmanSocket("server_info.json")
mailman.start()
2. Start up a server
Sensor Localization
Intro
System Infrastructure
{
"PORT": 5050,
"HEADER": 64,
"FORMAT": "utf-8",
"DISCONNECT_MSG": "DISCONNECT",
"logging_format": "%(asctime)s - %(message)s",
"logging_level": "info"
}
1. Server Descriptor
from audiosockets import MailmanSocket
mailman = MailmanSocket("server_info.json")
mailman.start()
2. Start up a server
Sensor Localization
Intro
System Infrastructure
{
"PORT": 5050,
"HEADER": 64,
"FORMAT": "utf-8",
"DISCONNECT_MSG": "DISCONNECT",
"logging_format": "%(asctime)s - %(message)s",
"logging_level": "info"
}
1. Server Descriptor
from audiosockets import MailmanSocket
mailman = MailmanSocket("server_info.json")
mailman.start()
2. Start up a server
3. Start Recorder Client
from audiosockets import RecorderSocket
recorder = RecorderSocket("server_info.json")
recorder.start()
Sensor Localization
Intro
System Infrastructure
{
"PORT": 5050,
"HEADER": 64,
"FORMAT": "utf-8",
"DISCONNECT_MSG": "DISCONNECT",
"logging_format": "%(asctime)s - %(message)s",
"logging_level": "info"
}
1. Server Descriptor
from audiosockets import MailmanSocket
mailman = MailmanSocket("server_info.json")
mailman.start()
2. Start up a server
3. Start Recorder Client
from audiosockets import RecorderSocket
recorder = RecorderSocket("server_info.json")
recorder.start()
Sensor Localization
Intro
System Infrastructure
{
"PORT": 5050,
"HEADER": 64,
"FORMAT": "utf-8",
"DISCONNECT_MSG": "DISCONNECT",
"logging_format": "%(asctime)s - %(message)s",
"logging_level": "info"
}
1. Server Descriptor
from audiosockets import MailmanSocket
mailman = MailmanSocket("server_info.json")
mailman.start()
2. Start up a server
3. Start Recorder Client
from audiosockets import RecorderSocket
recorder = RecorderSocket("server_info.json")
recorder.start()
4. Start a Processor
from audiosockets import BaseProcessorSocket
from audiosockets.utils import LogMelSpectrogram
class LogMelSpecProcessor(BaseProcessorSocket):
def __init__(self,*args, **kwargs):
super().__init__(*args, **kwargs)
def process_data(self,data):
fs = data["fs"]
audio = data["data"]
lms = LogMelSpectrogram(fs)(audio)
print(lms.shape)
processor = LogMelSpecProcessor("VAD", "server_info.json")
processor.start()
Sensor Localization
Intro
System Infrastructure
{
"PORT": 5050,
"HEADER": 64,
"FORMAT": "utf-8",
"DISCONNECT_MSG": "DISCONNECT",
"logging_format": "%(asctime)s - %(message)s",
"logging_level": "info"
}
1. Server Descriptor
from audiosockets import MailmanSocket
mailman = MailmanSocket("server_info.json")
mailman.start()
2. Start up a server
3. Start Recorder Client
from audiosockets import RecorderSocket
recorder = RecorderSocket("server_info.json")
recorder.start()
4. Start a Processor
from audiosockets import BaseProcessorSocket
from audiosockets.utils import LogMelSpectrogram
class LogMelSpecProcessor(BaseProcessorSocket):
def __init__(self,*args, **kwargs):
super().__init__(*args, **kwargs)
def process_data(self,data):
fs = data["fs"]
audio = data["data"]
lms = LogMelSpectrogram(fs)(audio)
print(lms.shape)
processor = LogMelSpecProcessor("VAD", "server_info.json")
processor.start()
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
speakers with > 41 mins
80%
10%
seen
seen
seen
10%
Stats for Gaussians
unseen
unseen
?
?
?
What happens if we vary the number of speakers enrolled?
29 Speakers
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
C
C
B
B
B
B
B
B
B
B
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Speaker Identification
Sensor Localization
Intro
System Infrastructure
Brighter colors indicative of later stages
Speaker Identification
Sensor Localization
Intro
System Infrastructure